DeepNLP Blog: Machine learning,NLP,CV and related technical fields.

Introduction to Multimodal Generative Models-Model Architecture Key Features and Codes
rockingdingo 2024-02-24 #Multimodal Generative Models #AIGC #Large Language Model
In this blog, we will give you a brief introduction of what are multimodal models and what can multimodal generative models accomplish. OpenAI just released their latest text-to-video multimodal generative model "SORA" in Feb, 2024 which becomes extremely popular. SORA can generate short videos of up to 1 minute's length. Before SORA, there are also many generative multi-modal models released by various companies, such as BLIP, BLIP2, FLAMINGO, FlaVA, etc. We will summarize a complete list of these time tested multi-modal generative models, introduce the model architures (text and image encoder), the training process, tasks, latex equation of loss functions, the Vision Language capabilities (such as text-to-image, text-to-video, text-to-audio, visual question answering), etc. Tag: Multimodal, AIGC, Large Language Model
READ MORE

Chatbot close

Send

Introduction to Multimodal Generative Models-Model Architecture Key Features and Codes